Development and validation of a prognostic scoring system for patients with colorectal cancer hepato-pulmonary metastasis: a retrospective study

Background Hepato-pulmonary metastasis of colorectal cancer (CRC) is a rare disease with poor prognosis. This study aims to establish a highly efficient nomogram model to predict overall survival (OS) and cancer-specific survival (CSS) in patients with colorectal cancer hepato-pulmonary metastasis (CRCHPM). Methods We retrospectively analyzed the data of patients with CRCHPM from SEER database and Wuhan Union Hospital Cancer Center (WUHCC). A total of 1250 CRCHPM patients were randomly assigned to the training, internal validation, and external validation cohorts from 2010 to 2016.Univariate and multivariate cox analysis were performed to identify independent clinicopathological predictors of OS and CSS, and a nomogram was constructed to predict OS and CSS in CRCHPM patients. Results A nomogram of OS was constructed based on seven independent predictors of age, degree of differentiation, T stage, chemotherapy, number of lsampled lymph nodes, number of positive lymph nodes, and tumor size. Nomogram showed favorable sensitivity in predicting OS at 1, 3 and 5 years, with area under the receiver operating characteristic curve (AUROC) values of 0.802, 0.759 and 0.752 in the training cohort;0.814, 0.769 and 0.716 in the internal validation cohort;0.778, 0.756 and 0.753 in the external validation cohort, respectively. A nomogram of CSS was constructed based on three independent predictors of T stage, chemotherapy, and tumor size. The AUROC values of 1, 3 and 5 years were 0.709,0.588,0.686 in the training cohort; 0.751, 0.648,0.666 in the internal validation cohort;0.781,0.588,0.645 in the external validation cohort, respectively. Calibration curves, Concordance index (C-index), and decision curve analysis (DCA) results revealed that using our model to predict OS and CSS is more efficient than other single clinicopathological characteristics. Conclusion A nomogram of OS and CSS based on clinicopathological characteristics can be conveniently used to predict the prognosis of CRCHPM patients.

reduced [1]. Advanced colorectal cancer is still a fatal disease. Despite improvements in survival rates at diagnosis of metastatic disease over the past 20 years, significant heterogeneity in survival outcomes persists [2]. With the change of living habits and diet, the prevalence of CRC is increasing and the incidence is gradually younger [3]. In February 2022, the National Cancer Center released the latest issue of cancer statistics in China. In general, the incidence of CRC is 10.67% in men and 9.26% in women, and the overall incidence of CRC is 10.04% [4]. Feng et al. [5] analyzed the current cancer situation in China according to the data provided by Cancer Today Institute in 2018. The incidence of CRC is 12.8% in men, that 11.3% in women, and the overall incidence of CRC is 12.2%. CRC is one of the cancers with high morbidity and mortality in China. In addition, CRC is also a cancer with a high rate of metastasis, with distant metastasis further reducing patient survival. Population-based studies have shown that approximately 25-30% of patients diagnosed with CRC developed liver metastases during the course of the disease [6,7]. The lung is the second most common site of metastasis for CRC second only to the liver. Compared with colon cancer, rectal cancer has a higher incidence of lung metastases. In China, the percentage of rectal cancer cases is about 50%, much higher than in Western countries nearly 30%, the burden of cancer has increased significantly [8].
Although the OS of CRC patients with distant metastasis is poor, the prognosis of patients with metastatic colorectal cancer has improved over the past few years due to improved surgical techniques, enhanced perioperative care, and huge developments in treatment modalities. In particular, patients with unresectable liver or lung metastasis at the initial stage can achieve tumor reduction through various chemotherapy methods, thus achieving tumor resection and improving survival rate [8][9][10][11]. In the 1990s, 2-year OS for stage IV colorectal cancer was only 21%, but in the 2010s, 5-year OS increased to 35-40% [12,13]. At present, the prognosis of patients with metastatic CRC has been improved, however, it is mainly for CRC patients with single organ metastasis, and the prognosis of CRC patients with multiple organ metastasis is still poor [14,15].
Patients with hepato-pulmonary metastases from CRC are rare and have a poor prognosis, which can lead to a significant decrease in survival rates with various treatment modalities and thus place a significant burden on the healthcare system. Our inability to accurately predict prognosis has been identified as a significant barrier to effective physician-patient communication [16]. Therefore, accurately predicting the prognosis of patients with CRCHPM is crucial for treatment selection and effective patient-provider communication. However, it is difficult to predict the OS for mixed advanced cancer patients with limited life expectancy and poor physical states [17].
The prognosis of CRC is related to lymph nodes, TNM stage, radiotherapy, and primary location of the tumor, but individual indicators are not sufficiently predictive of prognosis. Engstrand J et al. [18] used cox regression to predict survival rates and showed that survival rates after liver metastases in CRC varied by location. Tang et al. [19] identified 11 prognostic factors affecting liver metastasis from CRC by creating a nomogram and suggested that gender is not only a risk factor but also a prognostic correlate of liver metastasis from CRC. In our previous studies, we also found that a nomogram constructed based on clinicopathological characteristics could be easily used to predict OS and CSS in patients with liver metastasis from CRC [20]. In addition, related studies have also predicted the prognosis of patients with CRC lung metastasis based on clinicopathological characteristics. Huang et al. [21] analyzed clinicopathological characteristics by multivariate logistic regression analysis and found that 9 clinicopathological characteristics were independent risk factors for lung metastasis of CRC, and nomogram construction based on these indicators can effectively predict prognosis. Wang et al. [22] constructed a nomogram model combining machine learning-pathology, radiological features, immune score and clinical factors to predict the prognosis of CRC patients after lung metastasis, and found that this nomogram could be used to predict OS and DFS after lung metastasis of CRC patients. However, these studies only predicted the prognosis of patients with single organ metastasis of CRC, and did not study the prognosis of patients with multiple organ metastasis of CRC. To our knowledge, there are few studies on the prognosis of CRC with multiple organ metastasis, and there is a lack of their own clinical data to verify [23,24]. Due to the poor prognosis of CRCHPM, the study on the prognosis of CRCHPM deserves further exploration.
In our study, patients with CRCHPM were divided into different cohorts based on the public database and the data of our medical center. Then accurate and effective OS and CSS prediction nomograms of patients with CRCHPM were established according to clinicopathological characteristics, and the predictive value of the prediction nomograms was further evaluated.

Study design and data collection
In this study, a total of 1250 patients with CRCHPM were enrolled, including 1187 from the SEER dataset and 63 from the WUHCC cohort. The selection criteria included: CRC patients diagnosed with hepato-pulmonary metastasis at all ages from 2010 to 2016.A detailed flow diagram of the patient selection process is shown in Fig. 1.
Age, sex, tumor location, degree of differentiation, histological type, tumor size, TNM, T stage, N stage, M stage, radiotherapy, chemotherapy, number of sample lymph nodes, number of positive lymph nodes, primary surgery, and metastatic surgery were analyzed in all three cohorts. Patients in the SEER database were randomly divided into training cohort and internal validation cohort, and eligible patients from the WUHCC cohort were used as external validation cohort. The model was established according to the training cohort, and then the model was verified by two validation cohorts. The study protocol was reviewed and approved by the Ethics Committee and institutional Review Committee of Wuhan Union Hospital (No.2018-S377). All patients provided written informed consent and all procedures performed in studies involving human participants were accordance with the Helsinki Declaration.

Nomogram establishment and validation
Univariate and multivariate cox regression analyses were used to calculate the impact of each clinicopathological characteristics on OS and CSS in the training, internal and external validation cohorts. The impact of each variable on CSS and OS was measured as a hazard ratio (HR) and used to identify independent prognostic risk factors. A nomogram was established based on the independent prognostic clinicopathological characteristics obtained by training cohort multivariate cox regression. The total points of each patient in the training, internal validation and external validation cohorts was calculated according to the established nomogram, and cox regression analysis was performed on the three cohorts using the total points as a parameter.

AUROC, calibration curve, C-index and DCA analysis
The results of nomogram calibration are evaluated by Hosmer-Lemeshow test and displayed in the form of calibration curves. The ROC curves of 1-, 3-and 5 years are used to show the accuracy of nomogram's time prediction ability, and AUROC was used to evaluate nomogram's ability to predict OS and CSS in CRCHPM. In addition, c-index was defined as the ratio of all patient pairs predicted to conform to the results, DCA has recently been proposed as a new way to visualize the potential clinical value of risk prediction models. Therefore, the above methods were adopted in this study to verify the clinical efficacy of nomogram.

Statistical analysis
Categorical variables were expressed as number and percentage and Chi-square test or Fisher's exact test was used for comparison. Univariate and multivariate cox regression analysis was used to calculate the influence of variables on OS and CSS. The impact of each variable was evaluated by HR. In the multivariate cox regression model, the method of stepwise selection is adopted to filter the variables (p < 0.05). Statistical Package for the Social Sciences (SPSS) 22.0 (IBM Corp., Armonk, NY, USA) and R 4.0.2 (R Foundation for Statistical Computing, Vienna, Austria) were used for Statistical analysis. The R packages used were "rms", "survival", "survminer", "regplot", "ggDCA", "timeROC" and "survcomp". All statistical tests were two-sided, and the statistical significance was set to 0.05.

Prognosis and clinicopathological characteristics of patients with CRCHPM
First, we analyzed the OS of CRCHPM and colorectal cancer liver metastasis (CRCLM) in SEER database and WUHCC cohort from 2010 to 2016. 1187 and 9332 patients with CRCHPM and CRCLM occurred in SEER database, respectively.63 and 340 patients in WUHCC cohort respectively. And we found that the OS of patients with CRCHPM was significantly worse than that of patients with CRCLM in both cohorts (Fig. 1A, B). Subsequently, patients with CRCHPM were analyzed. In the SEER database, 70% (n = 831) of patients were randomly assigned to the training cohort, while the remaining patients (n = 351) were included in the internal validation cohort. Patients (n = 63) from the WUHCC cohort were included in the external validation cohort. The detailed flow diagram of the standard procedures for patient inclusion and exclusion is shown in Fig. 1C.
The mortality was 81.6 and 66.76% for all CRCHPM patients in the SEER and WUHCC cohorts, respectively. Most patients were older than 60 years (692,58.2%;41;65.1%) and male (632,53.2%;33,52.4%) in the SEER cohorts and WUHCC cohorts, respectively. In the SEER cohorts, compared with alive patients, dead patients were more likely to have larger tumor diameter, higer T stage, higher rates of postive lymph node and tumors localized more commonly in the colon. In the WUHCC cohorts, dead patients were more likely to have higer N satge, higher rates of sampled lymph node metastasis and tumors localized also more commonly in the colon. In both the SEER and WUHCC cohorts, most patients received neoadjuvant chemotherapy (70.6 and 65.1%, respectively), while fewer patients received radiotherapy (6.0 and 9.5%, respectively, Table 1).

Independent prognostic factors in patients with CRCHPM
According to univariate cox regression analysis of the training cohort, age, tumor location, degree of differentiation, T stage, N stage, tumor size, radiotherapy, chemotherapy, number of sample lymph nodes, number of positive lymph nodes were closely related to OS. Multivariate cox regression analysis showed that age, degree of differentiation, T stage, chemotherapy, number of sample lymph nodes, number of positive lymph nodes and tumor size were independent prognostic factors of CRCHPM. In the internal validation cohort, univariate cox analysis indicated that age, degree of differentiation, T stage, N stage, chemotherapy, number of sample lymph nodes, number of positive lymph nodes, Primary surgery was closely related to OS. Multivariate cox analysis showed that age, degree of differentiation, T stage, chemotherapy and number of positive lymph nodes were independent prognostic factors of CRCHPM. Multivariate analysis in an external validation cohort suggested that age, tumour location, degree of differentiation, N stage, chemotherapy and Metastatic surgery were independent prognostic factors ( Fig. 2A).
In the CSS study, univariate and multivariate cox regression analysis of the training cohort indicated that T stage, chemotherapy, and tumor size were independent prognostic factors. Univariate and multivariate cox analysis of internal validation cohort indicated that T stage, chemotherapy and number of positive lymph nodes were independent prognostic factors. Externally verified multivariate analysis suggested that the degree of differentiation was an independent prognostic factor (Fig. 2B).

Construction of the nomogram
Develop a nomogram of OS and CSS based on independent prognostic factors of OS and CSS determined by multivariate cox regression of training cohort. According to this nomogram, we can obtain the risk score of each variable, and then add the risk score of each variable to get the total score to obtain the probability of CSS and OS at 1, 3, and 5 years for each patient (Fig. 3).

Comparison of nomogram with TNM stage and chemotherapy
TNM stage and adjuvant chemotherapy are important indicators to evaluate the prognosis of tumor metastasis. We found that the TNM stage of CRCHPM patients was IVB, and most of the patients with IVB received adjuvant chemotherapy. To confirm whether nomogram can predict the prognosis of CRCHPM patients better than TNM stage and chemotherapy, time-dependent ROC analysis was performed at 1, 3 and 5 years, respectively.
The 1-, 3-, and 5-year AUC values of the nomogram for the prediction of OS in the training cohort were 0.802,   (Fig. 4A).
The ROC curve represents the discrimination ability of nomogram. We found that nomogram showed good sensitivity in predicting OS and CSS at 1, 3, and 5 years in the training cohort, internal validation cohort, and external validation cohort, and was significantly better than TNM stage and chemotherapy (Fig. 4).

Evaluation and validation of prediction OS and CSS nomogram
C-index values and calibration curves are usually used to evaluate the discrimination of nomogram. In the training cohort, internal validation cohort, and external validation cohort, The c-indexes for predicting OS were 0.719 (95%CI 0.698, 0.740), 0.733 (95%CI 0.702, 0.763), and 0.737 (95%CI 0.661, 0.813), respectively. In addition, the c-indexes of CSS prediction were 0.678 (95%CI 0.632, 0.723), 0.739 (95%CI 0.685, 0.793) and 0.722 (95%CI 0.654, 0.824) in the training cohort, internal verification cohort and external verification cohort, respectively. This indicated that nomogram has a good ability to identify CRCHPM patients. In addition, nomogram's calibration curve has no obvious deviation from the reference line in the three cohorts, showing high reliability (Fig. 5).
It is a new method to evaluate the diagnostic accuracy of diagnostic models, which is superior to AUROC in clinical value evaluation. In training, internal validation and external validation cohorts, the nomogram DCA score has a higher net benefit than the TNM stage, indicating that it has better clinical value than the TNM stage (Fig. 6).

Discussion
CRC is the second most lethal neoplastic disease in the world [25]. The prognosis of colorectal cancer is still unsatisfactory, despite the rapid development of treatment strategies in recent years [26][27][28]. Patients with metastatic CRC had worse survival outcomes compared with patients without metastases, with a 5-year survival rate of only 14.0% [29]. At present, there are many studies on the distant metastasis of CRC mainly focusing on the CRC liver metastasis. Although the incidence of CRCHPM or multiple metastases is low, we found that the CRCHPM prognosis is worse than liver metastases. There is no study can effectively predicted the prognosis of CRCHPM. We combined the SEER database with our medical center data on the prognosis of CRCHPM patients to identify independent clinicopathological predictors by univariate and multivariate cox analysis and to construct a nomogram to predict CSS and OS. And the results showed that our nomogram based on simple clinicopathological characteristic can accurately predict the 1-year, 3-year, 5-year OS and CSS of CRCHPM patients. At the same time, we used DCA to compare the predictive ability of nomogram with TNM Stage and chemotherapy, and the results showed that our nomogram had a higher predictive ability for the prognosis of CRCHPM than TNM stage and chemotherapy.
There are also relevant literature reports on the prognosis prediction of patients with distant metastasis of colorectal cancer. Liu et al. [30] constructed a nomogram based on 14 factors to predict OS for distant metastasis of CRC, the nomogram was a good predictor of survival in patients with stage IV CRC, but the study was deficient in predicting only OS and lacked an external validation cohort, and did not compare the predictive effect of nomogram with individual factors such as TNM stage and chemotherapy. Tang et al. [19] established nomogram based on 14 prognostic factors of synchronous liver metastasis, and the study found that bone and lung metastases were positively correlated with the risk of synchronous liver metastases and closely related to OS. But the same shortcomings of this study are the lack of an external validation cohort. Wu et al. [31] established nomogram based on 9 factors to predict the OS and CSS of CRC patients with liver metastases only. The study found that compared with the TNM stage system, the constructed nomogram had better prediction performance in predicting OS and CSS. However, this study lacked external validation and the value of C-index of the training and validation sets was not high. Mo et al. [32] used nomogram to predict the OS of metastatic CRC when they predicted the metastatic location of CRC. The results showed that the predictive ability of nomogram for OS and CSS is better than TNM stage were consistent with our research. The conformation of a nomogram based on appropriate clinicopathological characteristic can predict the prognosis of patients with distant metastases in CRC. However, most studies also did not have an external validation cohort and did not predict important prognostic indicators such as CSS. In addition, there are similar results yielded by several studies which are only on OS or single metastases and lack external validation cohort [33][34][35]. Our model based on clinicopathological characteristic can effectively predict the prognosis of CRCHPM, which has important clinical significance.
Our study found that T stage, tumor size and chemotherapy were critical for the prognosis of CRCHPM. A study showed that the larger the tumor is, the less likely patients will have a good prognosis and the greater risk they have liver, lung and brain metastases from tumor [32]. However, other studies have found the opposite. Li et al. [36] retrospectively studied the relationship between the size of resectable tumors at different sites and prognosis. Their research showed that smaller tumors were associated with poorer OS, CSS, and DFS. Further studies are needed on tumor size and prognosis. Campbell et al. [37] combined T-stage and venous invasion to predict postoperative prognosis in patients with CRC. Univariate analysis showed that T-stage was associated with poorer OS and CSS in patients. Li et al. [38] showed that both T and N stages significantly affected OS in patients with CRC, but the weight of T stage was greater than that of N stage. Wu et al. [39] studied the relationship between early T stage and prognosis in patients with liver metastases from colorectal cancer. The results showed that the prognosis of patients with T1 stage metastasis was significantly lower than that of T3/T4 stage metastasis patients, and early T stage was an independent prognostic factor for low survival rate. Chemotherapy is one of the most common treatment options for patients with CRC metastases. Studies show that in patients with stage II colorectal cancer, regardless of prior treatment, patients' age, or high-risk pathological risk characteristics, improved OS is associated with adjuvant chemotherapy [40]. In a retrospective study with a median follow-up of 37.0 months, the OS rates of CRC patients with and without adjuvant chemotherapy were 62.1 and 40.4%, respectively [41]. This is consistent with our findings. However, some studies have also shown that adjuvant chemotherapy for CRC patients with lung metastases has no significant effect on OS [42][43][44]. The benefit of chemotherapy in patients with CRC metastases requires further study. The inclusion of appropriate clinicopathological indicators has important clinical significance in predicting the prognosis of CRCHPM.
However, this study still has some limitations. One is the small sample size of our external validation cohort, which is mainly due to the low incidence of CRCHPM. At the same time, three cohorts had some data missing and could not be included in the analysis. Second, this is a retrospective study based on limited clinical records, and further multicenter prospective clinical studies are needed to demonstrate the clinical validity of this model. Third, we only include clinicopathological characteristics, but also include molecular pathological characteristics (KRAS, BRAF, Mismatch repair/ microsatellite instability) in variable analysis to further improve the prediction effect of nomogram. However, our study is also clinically important because there are very few studies to predict the prognosis of CRCHPM patients, and we are the first to propose the application of clinicopathological characteristics to predict OS and CSS with good predictive effect. In addition, we have our own large cohort data from which we screened CRCHPM patients with external validation from our own large sample of more than 3000 patients, making our study more credible. Our study developed an adjunct model to predict the prognosis of CRCHPM, but this adjunct should be used with caution based on the patient's overall situation.

Conclusions
In conclusion, we establish and validate a nomogram to predict CSS and OS in patients with CRCHPM based on significant clinicopathological characteristics. In addition to its excellent predictive power, this novel nomogram has sufficient discriminatory and calibration capabilities, it may provide valuable prognostic and predictive information to aid treatment strategies for CRCHPM patients.